Extending the Fellegi-Holt Model of Statistical Data Editing
نویسندگان
چکیده
This paper provides extensions to the theory and the computational aspects of the Fellegi-Holt Model of Editing (JASA 1976). If implicit edits can be generated prior to editing, then error localization (finding the minimum number of fields to impute) can be quite rapid. In some situations, not all of the implicit edits can be generated because of the great number (> 10^30) of distinct edit patterns. The ideas in this paper are intended to determine more rapidly the approximate minimal number of fields to change in situations where not all implicit edits can be generated prior to editing. As a special case, the formal validity of Bankier’s Nearest-Neighbour Imputation Method (NIM) is demonstrated.
منابع مشابه
State of Statistical Data Editing and Current Research Problems
1. INTRODUCTION This paper is my description of the state of statistical data editing and current research problems. It is not intended to be a complete description of all areas. Rather, it represents sub-areas of statistical data editing that I will describe in sufficient detail so that the discussion of a few research problems is more easily understood. I define statistical data editing (SDE)...
متن کاملData Quality: Automated Edit/Imputation and Record Linkage
Statistical agencies collect data from surveys and create data warehouses by combining data from a variety of sources. To be suitable for analytic purposes, the files must be relatively free of error. Record linkage (Fellegi and Sunter, JASA 1969) is used for identifying duplicates within a file or across a set of files. Statistical data editing and imputation (Fellegi and Holt, JASA 1976) are ...
متن کاملA Comparison Study of ACS If-Then-Else, NIM, and DISCRETE Edit and Imputation Systems Using ACS Data
In any statistical surveys, the information gathered may contain inconsistent, incorrect, or missing data. These erroneous data need to be revised or lled in prior to data tabulations and retrieval. The revisions of the erroneous data should not a ect the statistical inferences of the data. The missing data, as well as some inconsistent or incorrect data, are easy to identify while others are n...
متن کاملEditing Discrete Data
This paper describes theory, computational algorithms, and software associated with the DISCRETE edit system. The prototype DISCRETE edit system is based on the Fellegi-Holt model (JASA 1976) of editing. A new implicit-edit generation algorithm replaces an algorithm of Garfinkel, Kunnathur, and Liepins (Operations Research 1986). A characterization specific to the edit situation reduces the amo...
متن کامل5 Statistical Data Editing
Statistical Data Editing (SDE) is the process of checking data for errors and correcting them. Winkler (1999) defined it as the set of methods used to edit (i.e., clean up) and impute (fill in) missing or contradictory data. The result of SDE is data that can be used for analytic purposes. Editing literature goes back to the 1960s with the contributions of Nordbotten (1965), Pritzker, et al. (1...
متن کامل